fix(render): correctly escape non-BMP Unicode in AsciiJSON#4693
Open
xd-sarthak wants to merge 1 commit into
Open
fix(render): correctly escape non-BMP Unicode in AsciiJSON#4693xd-sarthak wants to merge 1 commit into
xd-sarthak wants to merge 1 commit into
Conversation
AsciiJSON escaped every non-ASCII rune with "\u%04x", which only yields a
valid escape for the Basic Multilingual Plane (U+0000-U+FFFF). For a code
point above U+FFFF such as U+1F600 it emitted six hex digits ("ὠ0").
A JSON parser reads \u as exactly four hex digits, so this decoded to "ὠ0"
instead of "😀".
Per RFC 8259, code points above U+FFFF must be encoded as a UTF-16
surrogate pair (two \uXXXX escapes). Detect r > 0xFFFF and emit the pair via
unicode/utf16.EncodeRune. ASCII and BMP output is unchanged.
Add a regression test asserting AsciiJSON output is ASCII-only and
round-trips back to the original value.
Fixes gin-gonic#4688
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #4693 +/- ##
==========================================
- Coverage 99.21% 98.38% -0.83%
==========================================
Files 42 48 +6
Lines 3182 3164 -18
==========================================
- Hits 3157 3113 -44
- Misses 17 42 +25
- Partials 8 9 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
AsciiJSONsilently corrupts Unicode code points above U+FFFF (emoji, CJK Extension B, etc.). The escape loop formats every non-ASCII rune with"\u%04x", which only produces a valid escape for the Basic Multilingual Plane (U+0000-U+FFFF). For a rune like the grinning face (U+1F600) it emits six hex digits:A JSON parser reads
\uas exactly four hex digits, so it consumes\u1f60and treats the trailing0as a literal character. The result is a valid-but-wrong string instead of the original code point.Per RFC 8259 section 7, code points above U+FFFF must be written as a UTF-16 surrogate pair (two
\uXXXXescapes). This PR detectsr > 0xFFFFand emits the pair via the standard libraryunicode/utf16.EncodeRune:which round-trips back to the original character. ASCII and BMP paths are unchanged.
Fixes #4688
Changes
render/json.go: escape non-BMP runes as a UTF-16 surrogate pair.render/ascii_nonbmp_test.go: regression test asserting AsciiJSON output is ASCII-only and round-trips back to the original value.Notes
\uXXXXescape to a correct surrogate pair. The previous output was invalid for these inputs, so no correct consumer could have depended on it.Checklist
master.go test ./render/); gofmt + go vet clean.docs/doc.mdnot applicable.